Are All Commas Equal? Detecting Coordination in the Penn Treebank
نویسندگان
چکیده
Coordination has always been a difficult phenomenon, with regard to linguistic analysis, manual annotation, and automatic analysis. There is a considerable body of work on detecting coordination and on improving parsing for this phenomenon. However, most approaches were restricted to certain types of coordination, such as NP coordination or symmetrical coordination. We present the first approach to classifying punctuation signs into whether they function as separators between conjuncts in coordination or not.We show that by using information from a parser in combination with context information, we reach an F-score of 89.22 on positive cases.
منابع مشابه
Chinese sentence segmentation as comma classification
We describe a method for disambiguating Chinese commas that is central to Chinese sentence segmentation. Chinese sentence segmentation is viewed as the detection of loosely coordinated clauses separated by commas. Trained and tested on data derived from the Chinese Treebank, our model achieves a classification accuracy of close to 90% overall, which translates to an F1 score of 70% for detectin...
متن کاملModeling Comma Placement in Chinese Text for Better Readability using Linguistic Features and Gaze Information
Comma placements in Chinese text are relatively arbitrary although there are some syntactic guidelines for them. In this research, we attempt to improve the readability of text by optimizing comma placements through integration of linguistic features of text and gaze features of readers. We design a comma predictor for general Chinese text based on conditional random field models with linguisti...
متن کاملCoordination Annotation Extension in the Penn Tree Bank
Coordination is an important and common syntactic construction which is not handled well by state of the art parsers. Coordinations in the Penn Treebank are missing internal structure in many cases, do not include explicit marking of the conjuncts and contain various errors and inconsistencies. In this work, we initiated manual annotation process for solving these issues. We identify the differ...
متن کاملAnnotating Coordination in the Penn Treebank
Finding coordinations provides useful information for many NLP endeavors. However, the task has not received much attention in the literature. A major reason for that is that the annotation of major treebanks does not reliably annotate coordination. This makes it virtually impossible to detect coordinations in which two conjuncts are separated by punctuation rather than by a coordinating conjun...
متن کاملUsing Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
Resolving coordination ambiguity is a classic hard problem. This paper looks at coordination disambiguation in complex noun phrases (NPs). Parsers trained on the Penn Treebank are reporting impressive numbers these days, but they don’t do very well on this problem (79%). We explore systems trained using three types of corpora: (1) annotated (e.g. the Penn Treebank), (2) bitexts (e.g. Europarl),...
متن کامل